VSTH: a user-friendly web server for structure-based virtual screening on Tianhe-2

Abstract Summary VSTH is a user-friendly web server with the complete workflow for virtual screening. By self-customized visualization software, users can interactively prepare protein files, set docking sites as well as view binding conformers in a target protein in a few clicks. We provide serval purchasable ligand libraries for selection. And, we integrate six open-source docking programs as computing engine, or as conformational sampling tools for DLIGAND2. Users can select various docking methods simultaneously and personalize computing parameters. After docking processing, user can filter docking conformations by ranked scores, or cluster-based molecular similarity to find highly populated clusters of low-energy conformations. Availability and implementation The VSTH web server is free and open to all users at https://matgen.nscc-gz.cn/VirtualScreening.html Supplementary information Supplementary data are available at Bioinformatics online.

Result VSTH can provide a complete workflow for molecular docking, including PDB file preparation, pocket setting, molecular database preparation, docking program selection, task monitoring, and result analysis and visualization. During the input file preparation process, the end users can follow the prompts on the web page to provide a structure of the protein target by either uploading a PDB file or obtaining online by inputting a PDB code. Users also can delete water, ion, ligand and specific side chains as they want (Fig.  S2). What's more, users can add hydrogen to the target protein by clicking the "Preprocess" button. There are 3 types of adding hydrogen, add H and rotate and flip NQH groups; add H and rotate groups with no NQH flips and add H, including His sc NH, then rotate and flip groups Fig. S3). VSTH provides three methods for defining targets: Option-1, a list of binding sites recommended by Fpocket(Le Guilloux, et al., 2009) and automatically present an interactive viewer of the protein target and the best scoring pocket. Users can select binding site from the at most 10 ranked pockets. Option-2, users can define the binding site by the centroid of the binding ligand. Option-3, users can supply the (X, Y, Z) coordinates of its center and its box size directly. What's more, users can define the number of poses. (Fig. S4). Step 3 allows users to select libraries. Available public libraries in VSTH include DrugBank5.0 (approved drug dataset, comprises 2387 molecules) (Wishart et al., 2018), HMDB4.0 (comprises 13,875 molecules) (Wishart et al., 2007), and InterBioScreen 2020 (purchased nature compound dataset and synthetic compounds dataset vended by InterBioScreen Company, comprises 555,295 molecules). There are many filters available for refining these molecular libraries, including by topological polar surface area, molecular weight, logP, number of acceptors or donors of hydrogen, rotabable bonds and rings. Users can also upload their own ligand libraries. VSTH supports file formats like pdbqt, mol2, mol, sdf, sd, and smi. If more files are used, users can compress these files in zip or tar.gz format. (Fig. S5 and Fig. S6).  Step 4 allows users to select docking programs and set parameters. VSTH provides 6 docking programs (1). We provide two versions for Autodock Vina. The v1.1.2 (Trott and Olson, 2009) is the default, and the latest version v1.2.3 (Eberhardt, et al., 2021) can be chosen by users. Users can select at most 3 programs at one time. Generally, VSTH provides the default parameters for docking, while users can modify parameters for personalized docking (2). All advanced parameters are provided according to docking programs and can be found in Table 2. Besides these advanced parameters, users have the option to select DLIGAND2 (Chen et al., 2019) to re-score the docking conformations. For conformation classification, users can set the cutoff argument (Fig.  S7).  --------GalaxyDock3 The e1 max E_CUTOFF 1000000 --------

On
Step 5, users have the option to type in their email. This email address will be used to receive the task id and get reminder when the task has been finished or cancelled ( Fig. S8 and Fig. S9). We design a dashboard to check job status, and users need to provide a task identifier to obtain the specified job. Based on process of the job on the back-end server, we provide status of pending, running and finish. Users can also cancel the job on the dashboard due to some errors. When the job is finished, users can download the protein file, conformation file and the related docking score.

Comparisons with other web-based VS programs
As shown in Table S3, we compare VSTH with state-of-art structure-based web servers based on functions for the whole VS process. One unique function of VSTH is to allow online processing of proteins, which ensures the integrity of the data flow during the VS process and does not need to process any data locally. Another feature of VSTH is that it provides multiple docking engines and a scoring function for re-scoring, which improves the docking prediction reliability of VSTH. By synthesizing the results of multiple docking engines and rescoring the docking poses, it is possible to avoid the inclination of docking software to a certain type of proteins in the design process.

Case Study
We will show the capacities of VSTH in identifying decoys on known binding site and screening potential compounds on unknown binding site in the following evaluations

Study 1: Evaluating VSTH on targets with known binding site
To assess the performance of docking methods in predicting the structures of protein complexes, we used two benchmarks. We first docked 10 receptor-ligand complexes obtained from DUD-E database (Mysinger et al., 2012) using AutoDock Vina and reassessed these docking poses by DLIGAND2. The average root-mean-square deviations (RMSD) for AutoDock Vina and DLIGAND2 are 5.47 Å and 2.1Å (Supplementary Table S4), indicating that DLIGAND2 can improve the ability of identifying the correct poses. We then used decoy libraries for the same data set to validate the ability of separating true molecules from decoys. DLIGAND2 achieved the better enrichment factor (EF) than AutoDock Vina with EF 1% of 10.01, EF 5% of 4.31 and EF 10% of 2.95, respectively, which is shown in Table S5.